SVM-Based Negative Data Mining to Binary Classification
نویسندگان
چکیده
The properties of training data set such as size, distribution and number of attributes significantly contribute to the generalization error of a learning machine. A not-well-distributed data set is prone to lead to a partial overfitting model. The two approaches proposed in this paper for the binary classification enhance the useful data information by mining negative data. First, error driven compensating hypothesis approach is based on the Support Vector Machines with 1+k times learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which, each label is a transformation of the label from the negative data set, further produces the child positive and negative data subsets in subsequent iterations. This procedure refines the model created by the base learning algorithm, creating k number of hypotheses over k iterations. A predicting method is also proposed to trace the relationships between the negative subsets and testing data set by vector similarity technique. Second, a statistical negative examples learning approach based on theoretical analysis improves the performance of base learning algorithm learner by creating one or two additional hypothesis audit and booster to mine the negative examples output from the learner. The learner employs a regular support vector machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance could be negative. The negative examples are strongly imbalanced. However, boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on the training data subset with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result and learner agrees with audit's judgment, otherwise returns the booster's result. The error ε of base learning algorithm is proved to decrease from) (ε O to) (2 ε O. Zhang, for their kind guidance and precise advisement during the process of my Ph.D. dissertation. The dissertation would not have been possible without their helps. Secondly, I would like to thank Dr. Yi Pan and Dr. Yichuan Zhao for their well-appreciated support and assistance. Finally, I want to thank my family and friends for their support and beliefs.
منابع مشابه
A new classification method based on pairwise SVM for facial age estimation
This paper presents a practical algorithm for facial age estimation from frontal face image. Facial age estimation generally comprises two key steps including age image representation and age estimation. The anthropometric model used in this study includes computation of eighteen craniofacial ratios and a new accurate skin wrinkles analysis in the first step and a pairwise binary support vector...
متن کاملA New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms
Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...
متن کاملThe Use of the Binary Bat Algorithm in Improving the Accuracy of Breast Cancer Diagnosis
Introduction: The early diagnosis of breast cancer as prevalent cancer among women, is a necessity in the research on cancers since it could simplify the clinical management of other patients. The importance of the classification of breast cancer patients into high- or low-risk groups has led research groups in the biomedical and informatics departments to evaluate and use computer techniques s...
متن کاملThe Use of the Binary Bat Algorithm in Improving the Accuracy of Breast Cancer Diagnosis
Introduction: The early diagnosis of breast cancer as prevalent cancer among women, is a necessity in the research on cancers since it could simplify the clinical management of other patients. The importance of the classification of breast cancer patients into high- or low-risk groups has led research groups in the biomedical and informatics departments to evaluate and use computer techniques s...
متن کاملارائه یک مدل بهینهسازی ریاضی چندهدفه برای طبقهبندی
In this paper we investigate the issues of data classification (as one of the branches of data mining science) in form of multi-objective mathematical programming model. The model that we present and investigate is a MODM problem. First time, based on support vector machine (SVM) idea (To maximize the margin of two groups), a multi-criteria mathematical programming model was proposed for data m...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کامل